A Mask Estimation Method Integrating Data Field Model for Speech Enhancement
نویسندگان
چکیده
In most approaches based on computational auditory scene analysis (CASA), the ideal binary mask (IBM) is often used for noise reduction. However, it is almost impossible to obtain the IBM result. The error in IBM estimation may greatly violate smooth evolution nature of speech because of the energy absence in many speech-dominated time-frequency (TF) units. To reduce the error, the ideal ratio mask (IRM) via modeling the spatial dependencies of speech spectrum is used as an optimal target mask because the predictive ratio mask is less sensitive to the error than the predictive binary mask. In this paper, we introduce a data field (DF) to model the spatial dependencies of the cochleagram for obtaining the ratio mask. Firstly, initial T-F units of noise and speech are obtained from noisy speech. Then we can calculate the forms of the potentials of noise and speech. Subsequently, their optimal potentials which reflect their respective distribution of potential field are obtained by the optimal influence factors of speech and noise. Finally, we exploit the potentials of speech and noise to obtain the ratio mask. Experimental results show that the proposed method can obtain a better performance than the reference methods in speech quality.
منابع مشابه
Speech enhancement based on hidden Markov model using sparse code shrinkage
This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...
متن کاملImproving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection
For single-channel speech enhancement, mask learning based approach through neural network has been shown to outperform the feature mapping approach, and to be effective as a pre-processor for automatic speech recognition. However, its assumption that the mixture and clean reference must have the correspondent scale doesn’t hold in data collected from real world, and thus leads to significant p...
متن کاملBinary mask estimation based on frequency modulations
In this paper, a binary mask estimation algorithm is proposed based on modulations of speech. A multi-resolution spectrotemporal analytical auditory model is utilized to extract modulation features to estimate the binary mask, which is often used in speech segregation applications. The proposed method estimates noise from the beginning of each test sentence, a common approach seen in many conve...
متن کاملFrequency Dependent Statistical Model for the Suppression of Late Reverberations
Suppression of late reverberations is a challenging problem in reverberant speech enhancement. A promising recent approach to this problem is to apply a spectral subtraction mask to the spectrum of the reverberant speech, where the spectral variance of the late reverberations was estimated based on a frequency independent statistical model of the decay rate of the late reverberations. In this p...
متن کاملEigenvector-Based Speech Mask Estimation Using Logistic Regression
In this paper, we use a logistic regression to learn a speech mask from the dominant eigenvector of the Power Spectral Density (PSD) matrix of a multi-channel speech signal corrupted by ambient noise. We employ this speech mask to construct the Generalized Eigenvalue (GEV) beamformer and a Wiener postfilter. Further, we extend the beamformer to compensate for speech distortions. We do not make ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017